There have been many great developments and considerable progress in the teaching of Statistics in the past 2 to 3 decades, even if we would wish for greater penetration and better understanding of statistical literacy and statistical thinking. However Statistics seems to be particularly susceptible to the type of leftovers which may have been appropriately tasty in the past, but hang around way past their use-by date. This susceptibility may be part of the puzzle as to why the extensive discussion and understanding of the nature of statistical literacy and thinking is not sufficiently widespread and has been somewhat neglected in the rush to embrace the latest appellations, including data literacy. So here are just a few small examples of statistical leftovers which should be checked for edibility and either ‘eaten’ – in the sense of correctly absorbed – or thrown out. I will focus on topics popular at school and introductory tertiary level. But I will not include those which have received considerable and ongoing attention elsewhere, such as poor, incorrect or inappropriate graphical representations, and will refer only tangentially to such matters as the ongoing discussion of p-values and effect sizes, etc. A measure of location is not necessarily a measure of centre. This is particularly relevant in considering the myths about modes. A mode can be viewed as a measure of location, albeit not necessarily a very good one, but it may be a very poor measure of centre. For a discrete variable such as categorical or count, and data on it, one or more of the most likely (theoretical) or frequent (data) values is a key feature but they could be poor measures of ‘centre’. For the distribution of a continuous variable, a mode is a local maximum of the probability density function and again could be anywhere. In addition, for data from a continuous variable, the situation is even more complex because we almost always have to group observations into classes (or bins as they are called for a histogram) to obtain a picture of the data and a pictorial ‘estimate’ of the density function. Different groupings can produce different pictures except in very large data sets. A data modal class is a property of, and defined only in terms of, the chosen groupings, as well as being a poor measure of centre. Hence, not only should mode not be taught as a measure of centre, but a case can be made for restricting the concept to discrete variables and their data until the concept of density function is introduced. So how did mode come to be included as a measure of centre? Most likely this goes back to the identification of useful classes of distributions such as the important Pearson (1895 and subsequent papers) system which eventually covered many of the most commonly-used continuous distributions, many of which are unimodal. Indeed, there are two main ways of producing new families of continuous distributions, one through the density function, thus involving consideration of modes, and the other through transformations of variables and hence via quantile functions. An associated comment about histograms, especially for school level, is appropriate here. Histograms are not bar charts (see Humphrey et al. 2014, for discussion of this and related points). The fact that the picture of the data given by a histogram is dependent, often heavily, on the choice of bins - both bin width and starting point – should be a key emphasis and ‘played with’ as soon as histograms are introduced. Students should see how the effects on the data ‘picture’ of changing groupings and starting points can be quite startling. Also, whether the x-axis is labelled by midpoints or starting points of bins is irrelevant – and certainly neither one nor the other is ‘correct’ - provided, as always, that it is quite clear what is being used. Another comment about histograms is that trying to infer from them sizes or relationships between measures such as standard deviation, mean compared with median, etc, is both futile and meaningless. The classic example is when changing just one observation hardly alters the ‘picture’ but reverses the order of mean and median. This leads to another two myths, that mean-median has the same sign as the third central moment (including that both are 0 or not), and that, in unimodal distributions, mean, median and mode always occur in this or the reverse order. Neither statement is true in general, whether for distributions or data. Again, these statements arose from very early work with distribution families for which they tended to hold, and the particular has been erroneously generalised. There are conditions under which they hold but it is easy to find distributions and of course many many datasets for which they don't. Skewness to the right (left) is the tendency for a distribution or data set to have a longer ‘tail’ to the right (left). There are many possible measures of this, most of which refer to some measure of centre, for example, mean or median or adapted versions of either, and their signs do not always coincide. Even, indeed especially, for distributions on the half-line, the signs of such measures can separately reverse as parameter values change. For data, sometimes all that students can say is that the data appear to be asymmetric – students should not be forced to choose between skew to left or right or symmetric. Indeed, students enjoy the challenge of constructing data sets to counter the myths. Some terminologies that originally served a purpose but now cause students confusion, should be corrected or ‘retired’. The term ‘population’ should now only be used when there is a genuine population. Otherwise ‘general situation’ or ‘data-generating process’ (as discussed by Lu and Henning 2013) avoids lasting student misunderstandings and confusion. Another very confusing group of terminologies for students are those relating to the so-called ‘different’ probabilities - such as theoretical, experimental, subjective – where this should refer to how probabilities can be assigned – by assumptions in models, by direct estimation, by subjective assignation, or a combination of assumption, model, estimation. There are not different types of probabilities – just different pathways in assigning probabilities. ‘Sample space’ is another term which is overdue for a make-over. We could then turn to improving understanding of some very fundamental concepts and results. For example, emphasis from the beginning that all probabilities refer to a particular situation or reference framework leads to understanding that all probabilities are conditional probabilities and that great care must be taken in working with or comparing probabilities. This is akin to ensuring absolute clarity in handling percentages in identifying just what the percentages are being taken of. As Sir David Spiegelhalter has often commented, and as all those involved in learning support know, the handling of percentages is a significant quantitative weakness for many precisely because of not being able to work with different ‘denominators’. Conditional probabilities should be introduced gradually across educational levels, always via conditioning language, data and relative frequencies. Whether one uses the approach of expected frequencies, conditioning language, percentages or chance, depending on what is preferred for a particular context, the essential ‘formula’ is P(AB) = P(A|B)P(B). Never should the term ‘multiplication rule’ ever be used, and never should P(AB) = P(A)P(B) be introduced before conditional probability just ‘because it's easier’. Nothing could be further from the truth. As illustrated in Joerg Meyer's article in this issue, sometimes independence is assumed in modelling a situation, and sometimes it is far from intuitive because it refers to the probability structure and not just the description of events. An example of a result which I find I often have to explain to those educated in earlier eras, is that it is the normal approximation to the distribution of the sample proportion that gives the commonly-used approximate confidence intervals and hypothesis tests for the theoretical proportion (usually a case when the word population is appropriate), and that the normal approximation to the binomial follows from this. That is, students can go to the confidence interval or test for p without going via the normal approximation to the binomial. I will never forget a gruelling session with a senior mathematician who insisted he knew about teaching statistics, but who still did not really understand this after two hours. The above confusion is also associated with the need for fundamental understanding of the different types of variables and data, and the importance of this understanding for all of statistics, from the earliest graphs to the data storage and handling challenges of data science to the most sophisticated of statistical analyses. Just two simple and very common misunderstandings are that Bernoulli data are not the same as data from a binomial variable, and that the very first assumption of so many much-used statistical analyses is that the response variable is at least continuous. And finally, at least for now, for a long time statisticians have been urging the use in all teaching of many-variabled real data sets in real contexts, and moving as quickly as possible, via technology, to understanding the concepts, assumptions and application of their statistical exploration and analysis, including correct interpretation of p-values and effect sizes. When debate about pedagogy gets bogged in the ‘one’ and ‘two-sample’ mire, is it any wonder that some in data science can claim that data science is not statistics?